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ABSTRACT 

We use a semi-analytical approach to simulate absorption spectra of QSOs at high redshifts 
with the aim of constraining the cosmic reionization history. We consider two physically mo¬ 
tivated and detailed reionization histories: (i) an Early Reionization Model (ERM) in which 
the intergalactic medium is reionized by PopIII stars at z ~ 14, and fii) a more standard Late 
Reionization Model (LRM) in which overlapping, induced by QSOs and normal galaxies, 
occurs at z « 6. From the analysis of current Lya forest data at z < 6, we conclude that 
it is impossible to disentangle the two scenarios, which fit equally well the observed Gunn- 
Peterson optical depth, flux probability distribution function and dark gap width distribution. 
At z > 6, however, clear differences start to emerge which are best quantified by the dark 
gap and peak width distributions. We find that 35 (zero) per cent of the lines of sight within 
5.7 < z < 6.3 show dark gaps widths > 50A in the rest frame of the QSO if reionization is 
not (is) complete at z > 6. Similarly, the ERM predicts peaks of width ~ lA in 40 per cent 
of the lines of sight in the redshift range 6.0 — 6.6; in the same range, LRM predicts no peaks 
of width > 0.8A. We conclude that the dark gap and peak width statistics represent superb 
probes of cosmic reionization if about ten QSOs can be found at z > 6. We finally discuss 
strengths and limitations of our method. 

Key words: intergalactic medium quasars: absorption lines - cosmology: theory large-scale 
structure of Universe. 


1 INTRODUCTION 

After the recombination epoch at z ~ 1100, the universe remained 
almost neutral until the first generation of luminous sources (stars, 
accreting black holes, etc) were formed. The photons from these 
sources ionized the surrounding neutral medium and once these in¬ 
dividual ionized regions started overlapping, the global ionization 
and thermal state of the intergalactic gas changed drastically. This 
is known as the reionization of the universe, which has been an 
important subject of research over the last few years, especially be¬ 
cause of its strong impact on the formation and evolution of the 
first cosmic structures (for a comprehensive review on the subject 
of reionization and first cosmic structures, see Ciardi & Ferrara 
2005). Calculations based on the hierarchical structure formation 
in cold dark matter (CDM) models, predict that reionization should 
naturally occur somewhere between z ~ 6 — 15 (Chiu & Ostriker 
2000; Gnedin 2000). 

Recent observational progresses, however, seem to indicate 
that the reionization history at high redshifts might have been a 
complicated process. To start with, WMAP observations of the 
temperature-polarization cross correlation at large scales for the 
CMB suggest a Thomson optical depth as high as r e ~ 0.17, which 
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implies that reionization might have occurred at redshifts as high 
as z ~ 15 (Kogut et al. 2003; Spergel et al. 2003). On the other 
hand, the spectroscopy of the Lya forest for QSOs at 2 > 6 dis¬ 
covered by the Sloan Digital Sky Survey (SDSS; Fan et al. 2001; 
Fan et al. 2003; Fan et al. 2005) seems to indicate that the ioniza¬ 
tion state of the intergalactic medium (IGM) might be very differ¬ 
ent along different lines of sight. For example, the analyses of the 
spectrum of the most distant known quasar (SDSS J1148+5251) 
show some residual flux both in the Lya and Ly j3 troughs, which 
when combined with Ly 7 region (Furlanetto & Oh 2005), imply 
that this flux is consistent with pure transmission. The presence of 
unabsorbed regions in the spectrum corresponds to a highly ion¬ 
ized IGM along that particular line of sight. However. Becker et al. 
(2001) detected a complete Gunn-Peterson trough in the spectrum 
of SDSS J1030+0524 (z = 6.28), where no transmitted flux is 
detected over a large region (300 A) immediately blueward of the 
Lya emission line. This result has been shown to be consistent with 
a hydrogen neutral fraction /hi > 10~ 3 (Fan et al. 2002, hereafter 
F02). 

There have been other different approaches to investigate the 
neutral hydrogen fraction. Wyithe, Loeb, & Carilli (2005) and 
Wyithe & Loeb (2004) have estimated the sizes of the ionized re¬ 
gions around 7 QSOs at z > 6 (which included the above cited 
QSO). Following that they considered the neutral gas surround¬ 
ing the QSO as a function of different parameters: the Stromgren 
sphere size Rs, the quasar’s production rate of ionizing photons 
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-Wphot- the clumping factor of the gas C and the age of the QSO 
f a ge. According to their arguments, the small sizes of the HII re¬ 
gions (~ 10 physical Mpc) imply that the typical neutral hydrogen 
fraction of the IGM beyond z ~ 6 is in the range 0.1 - 1. How¬ 
ever, this approach is weighted down by several uncertainties. For 
example, one of the uncertainties is the quasar’s production rate 
of ionizing photons IVphot as it depends on the shape of the spec¬ 
tral template used. Moreover it is implicitly assumed in the mod¬ 
elling of clumping factor that the formation of quasars and galaxies 
were simultaneous. This in turn implies that quasars ionize only 
low density regions and hence the clumping factor which regulates 
the evolution of the HII regions is low. If, instead, stars appears 
much earlier than QSOs, the quasars have to ionize high density re¬ 
gions, which means that one should use a higher value of clumping 
factor in the calculations (Yu & Lu 2005). 

Mesinger & Haiman (2004) have used a different approach 
based on the damping wings of the neutral hydrogen. Using den¬ 
sity and velocity fields obtained by hydrodynamical simulation, 
they computed the Lya absorption as a function of wavelength. In 
this case the neutral hydrogen fraction. (Vphot and Rs are treated 
as free parameters, constrained by matching the optical depth ob¬ 
served in the QSO SDSS J1030+0524. Also in this case they find 
a neutral hydrogen fraction larger than 10 per cent, i.e. the IGM is 
significantly more neutral at z ~ 6 than the lower limit directly ob¬ 
tainable from the GP trough of the QSO spectrum (10 -3 ). However 
this result is based only on one quasar. Moreover the observational 
constraints on the optical depth are very uncertain and can intro¬ 
duce errors in the estimates of /hi . 

The fact that we find transmission along some lines of sight 
while the medium seems quite neutral along others possibly has 
been interpreted that the IGM ionization properties are different 
along different lines of sight at z > 6, thus suggesting that we 
might be observing the end of the reionization process. 

Finally, the evolution of the luminosity function of Lya emit¬ 
ters (Malhotra & Rhoads 2004; Furlanetto, Zaldarriaga, & Hern- 
quist 2005; Haiman & Cen 2005) suggests that the neutral fraction 
of hydrogen at z — 6.5 should be greater than 50 per cent (Malho¬ 
tra & Rhoads 2005). 

From a theoretical point of view several efforts have been 
made in the last few years in order to reconcile WMAP and SDSS 
measurements (e.g. Chiu et al. 2003; Haiman & Holder 2003; 
Gnedin 2004). In particular. Choudhury & Ferrara (2005, hereafter 
CF05) have showed that a self-consistent model for reionization, 
which agrees with various sets of observations over a wide redshift 
range, predicts a highly ionized universe at z « 6. According to the 
model, the rise in the GP optical depth towards z = 6 is achieved 
by assuming a drop of the photoionization rate caused by the disap¬ 
pearance of first generation metal free (hereafter PopIII) stars. To 
explore this idea, it becomes important to verify whether the dif¬ 
ference in the GP trough along different lines of sight at z > 6 
can be explained by assuming a highly ionized universe. In other 
words we pose the question: is it possible that the IGM is overall in 
a highly ionized state and the differences observed in QSO spectra 
arise simply due to the cosmic variance in density fluctuations? 

Since at redshifts larger than 5, the transmission in the spectra 
is extremely low. the mean transmitted flux being F meSin < 0.182 
(Songaila 2004. hereafter S04), it is not possible to study the prop¬ 
erties of the Lya forest through the usual Voigt profile analysis. 
Alternative quantities have been used to characterize the spectra 
and to compare them with models, such as the probability distri¬ 
bution function (PDF) of the flux (F02; Songaila & Cowie 2002, 
hereafter SC02) and more importantly, the distribution of the dark 


gap widths. It turns out that the length of dark gaps at high red- 
shifts can be quite large (~ 80 Mpc comoving, which corresponds 
to ~ 30A in the rest frame wavelength at redshift 6), and hence 
the modelling requires a large sample of very long lines of sight 
(say, > 100 Mpc). Such requirement is beyond the reach of cur¬ 
rent numerical simulations and can only be fulfilled through semi- 
analytical calculations. 

In this paper we use semi-analytical techniques to produce ar¬ 
tificial spectra of the Lya forest in the redshift range 5.0 < z < 
6.6. We have considered different reionization scenarios, namely, 
(i) an early reionization model (CF05), characterized by an highly 
ionized IGM for z < 14, and (ii) a late reionization model, in which 
the overlapping of ionized regions is completed only by z ~ 6. The 
main goal of this paper is to identify a statistics for the Lya forest 
at z > 6 which can be used as an effective tool for discriminating 
between the two scenarios above. 

The outline of this paper is as follows. Section 2 presents the 
formalism to simulate the Lya flux from the IGM density and pe¬ 
culiar velocity fields. In addition, we also describe the two reion¬ 
ization scenarios in detail. Section 3 contains the comparison of our 
models with observational data at z < 6. We introduce various sta¬ 
tistical tools for the Lya forest and use our models to make testable 
predictions for z > 6. Finally we summarize our conclusions in 
Section 4. 


2 SIMULATING THE QSO ABSORPTION SPECTRA 

The Lya forest, observed in the absorption spectra of distant 
quasars, is the result of photon absorption through Lya transition 
of neutral hydrogen gas in the intergalactic medium along the line 
of sight (LOS). It is believed that the Lya forest arises from low- 
amplitude fluctuations in the underlying baryonic density field (Bi, 
Borner & Chu 1992). In this work, we shall use the formalism de¬ 
veloped by Bi & Davidsen (1997), Choudhury, Srianand, & Pad- 
manabhan (2001) and Viel et al. (2002) to simulate random LOS 
realizations of the Lya forest as observed in the quasar absorption 
spectra. 

2.1 Neutral hydrogen density and the transmitted flux 

Let us first summarize the steps for calculating the neutral hydrogen 
density field, optical depth and the transmitted flux for the Lya 
forest: 

2.1.1 Linear density and velocity fields for baryons 

In the linear regime, both the density and velocity fields for baryons 
are gaussian and are completely determined by the corresponding 
power spectra and correlation functions. The linear density power 
spectrum for baryons in one dimension is given by 

n (1) *) = d <l 9 W?(q, z) P^(q) (1) 

/ Q\ 

where i s the dark matter power spectrum in three dimen¬ 

sions at redshift z = 0, D(z) is the growth factor of linear DM den¬ 
sity fluctuations [normalized so that D(z = 0) = 1] and Wb(k, z) 
is the low-pass filter which suppresses baryonic fluctuations at 
small scales. It is well known that the exact form of Wb(k, z) de¬ 
pends on the ionization and thermal history of the universe, and 
one should, in principle, couple the evolution of the baryonic fluc¬ 
tuations to the reionization history. Since this is computationally 



Constraining the reionization history with QSO absorption spectra 3 


quite complex, one is led to use various approximate forms for the 
filter function. It turns out that if the temperature of the IGM is 
smoothly increasing with redshift and does not undergo any abrupt 
change (which is the case for our models), the filter function can be 
assumed to be of the form 


W b (k,z) 


1 

1 + xl(z)k 2 


( 2 ) 


where Xb(z) denotes the comoving scale below which fluctuations 
are suppressed: 


x b (z) 




2 7 k B T m {z) 1 1/2 
3/rm p fl m (l + z) 


(3) 


The constant /x is the mean molecular weight of the IGM, given 
by /x = 2/(4 — 3Y), where Y = 0.24 is the helium mass frac¬ 
tion (this relation assumes that the IGM consist mostly of fully ion¬ 
ized hydrogen and singly ionized helium). In equation (3) ks is the 
Boltzmann constant, m p is the hydrogen mass, is the density 
parameter for non-relativistic matter, T m is the mass-averaged tem¬ 
perature and 7 is the ratio of specific heats. Note that we are using 
the mass-averaged temperature T m which is considerably higher 
than the conventionally used temperature at the mean gas density 
To- This is motivated by the fact that baryonic density fluctuations 
calculated using the mass-averaged temperature are in a much bet¬ 
ter agreement with the results of numerical simulations. The evo¬ 
lution of Tm with z has been computed as described in CF05, in 
Section 5. 

The CDM power spectrum in three dimensions is taken to be 
P®(fc) = Adm k n T^ M (q), where n is the spectral index and 
Tdm(q) is the CDM transfer function (Bardeen et al. 1986): 


ln(l + 2.34q) 

TdmW - 2.34q X 

1 + 3.89q + (16.lq) 2 + (5.46q) 3 + (6.71 q) A 


- 1/4 


(4) 


with q = k/(h Mpc 1 )T 1 . The shape parameter V depends on 
the Hubble parameter, ti m and Q b (Sugiyama 1995): 


T = Qmh exp 




(5) 


The normalization parameter Adm is fixed through the value of as 
(the rms density fluctuations in spheres of radius 8 ft^ 1 Mpc). 

The computation of the linear baryonic peculiar velocity field 
f P ec(a:, z) is similar to that for the density field. We use the linear 
power spectrum for the velocity field in one dimension given by 


P^\k,z) = 


D(z) 


1+2 


k 2 ^ j^ w h q ,z) p Sl{q) (6) 


In addition, we do take into account the fact that the velocity field 
is correlated with the density field 


(k, z) = ^k± -^-Wb(q, 2 )P® (q). (7) 

Given the above relations and using the properties of gaussian ran¬ 
dom fields, one can generate the density and peculiar velocity fields 
for baryons in the linear regime. These are discussed in details in Bi 
& Davidsen (1997), Choudhury, Srianand, & Padmanabhan (2001) 
and Viel et al. (2002). 


2.1.2 Quasi-linear density field for baryons 


The above analysis is done in the framework of linear perturba¬ 
tion theory, while to study the properties of the IGM one has to 
take into account nonlinearities in the density distribution. To gen¬ 
erate the mildly non-linear regime of the IGM local density we use 
a lognormal model, introduced by Coles & Jones (1991) for the 
nonlinear matter distribution in the universe and widely adopted 
later (Bi 1993; Bi & Davidsen 1997; Choudhury, Padmanabhan, & 
Srianand 2001; Choudhury, Srianand, & Padmanabhan 2001; Viel 
et al. 2002) for the case of IGM. The lognormal distribution of the 
baryonic density field is given by 


n b {x,z ) 


no( 2 )exp 


S b (x,z) 


(S b (x,z)} 2 

2 


( 8 ) 


where 8 b is the linear baryonic density contrast and no ( 2 ) is the 
mean IGM density which is related to the baryonic density param¬ 
eter Q, b and the critical density p c through the relation 

no(z) = i^2(l + z) 3 . (9) 

p.m p 

We should mention here that the lognormal model is found to 
under-predict the number of high density regions compared to what 
is found in numerical simulations (CF05), and hence is inadequate 
for describing the highly non-linear densities. However, since these 
high density regions are quite rare, we expect them to have little ef¬ 
fect on the statistics of Lya forest performed in this paper. Thus, as 
far as this work is concerned, the lognormal approximation should 
work reasonably well for studying the properties of Lya forest. 
The accuracy of the lognormal approximation has been validated 
by Choudhury, Srianand, & Padmanabhan (2001) and CF05; the 
model is able to match various sets of observations simultaneously 
over a wide range of redshifts. For the purpose of this paper, we 
have also carried out some additional comparisons of the lognor¬ 
mal model with HydroPM simulations (Gnedin & Hui 1998) and 
found that, as far as the flux statistics used in this paper (the PDF 
and the distribution of dark gap widths) are concerned, the model 
gives quite good agreement with simulations. The details of such 
comparisons are given in Appendix A. 


2.1.3 Neutral hydrogen distribution 


The low density gas which gives rise to the Lya forest is approxi¬ 
mately in local equilibrium between photoionization and recombi¬ 
nation, expressed by the relation 


a[T(x, z)]n p (x, z)n e (x, z) =V w _i{x,z)rv R i(x,z) (10) 


where a(T) is the radiative recombination rate, n e and n p are the 
number density of electrons and protons respectively and Thi is 
the photoionization rate of neutral hydrogen. The characteristic low 
density of the IGM allow us to neglect the collisional ionizations. 

It is useful to define the neutral hydrogen fraction /hi : 


/hi (x,z) 


n H i(a :,z) 
hh(x,z) 


1.08 


nmjx, z ) 
n b (x,z) 


(ID 


where the factor 1.08 arises because of the presence of helium. To 
solve the ionization equilibrium equation ( 10 ) in an exact manner 
we need to know the precise ionization state of helium (which af¬ 
fects the number density of electrons n e ). However we can ob¬ 
tain the neutral fraction in the two extreme cases which are dis¬ 
cussed next. Usually, the ionization conditions in the Lya forest at 
3.5 < 2 < 5.5 are similar to those of HII regions with /hi < 10~ 4 ; 
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furthermore in such epochs, helium is mostly in a singly-ionized 
state. Thus with the approximation /hi <C 1, equation (10) gives 




1.08 


a[T(x,z)\ 
Thi(x, z) 


Uh{x, 


z) 


a[T{x,z)] 
Thi(*, z) 


rib{x,z).( 12) 


However, at higher redshifts (say, z > 5.5), one has to consider the 
possibility that the IGM is not completely ionized and there remain 
regions with a high neutral fraction. Such regions are opaque to 
ionizing radiation, and hence the effective photoionization rate in 
such regions can be taken to be zero; it follows that /hi ~ 1. 

The recombination coefficient at temperature T is given by 
(Rauch et al. 1997) 


a[T(x,z)\ =4.2 x 1CT 13 


T{x,z) 
10 4 IT 


3 -1 
cm s 


(13) 


For quasi-linear IGM, where non-linear effects like shock-heating 
can be neglected, the temperature T(x,z) can be related to the 
baryonic density through a power-law relation (Hui & Gnedin 
1997; Schaye et al. 2000) 


T(x,z ) = T 0 (z) 


nb(x,z) l 7 1 
no 0) 


(14) 


where Tq(z) is the temperature of the IGM at the mean density. 

The slope 7 of the equation of state depends on the reion¬ 
ization history of the universe (Theuns et al. 1998; Hui & Gnedin 
1997). The value of 7 and its evolution are still quite uncertain and 
hence in this work we will consider it as a free parameter and ignore 
its redshift evolution. 


2.1.4 Optical depth and transmitted flux for the Lya forest 


The transmitted flux F due to Lya absorption in the IGM is com¬ 
puted from the usual relation F = e -1Xy “, where TL ya is the Lya 
absorption optical depth. The value of the optical depth at a redshift 
zo is given by 


X Lye* ( Zq ) — 


da 


V 


J d x(z 


nm{x(z),z) 
b(x(z),z)( 1 + 2 ) 


vh(zq) - vh(z) - Vpec (x(z),z) 


b(x(z),z) 


(15) 


where vh(zo) — vh(z) = c( 2 o — z)/( 1 + 20 ) denotes the dif¬ 
ferential Hubble velocity between two points along the LOS. The 
quantity I a = 4.45 x 1CF 18 cm 2 . 


b(x, 2 ) 


' 2k B T(x, 2 ) ! 1/2 
m p 


(16) 


is the Doppler parameter, V is the Voigt function and a measures 
the natural line width of Lya transition. The quantity x{z) denotes 
the comoving distance to a point along the LOS at a redshift 2 : 


X{z) = l dZ 'l^) (17) 

In this work, each LOS is discretized in a number of pix¬ 
els lVpi x . At each pixel, we calculate the neutral hydrogen density 
riHi and the peculiar velocity v pec through the procedure discussed 
above. Then the optical depth at a pixel i is given by 


TLyct(i) = d a —— ^ ™HI (j) $a [v H (*) - v(J)] 


1 + 2 


1=1 


(18) 


transition. For low column density regions, the natural broadening 
is not that important, and the Voigt function reduces to a simple 
Gaussian 


^a[v H {i) -!>(/)] 


7 = 77 exp 
sPTb{j) 


( VH{i) -v{j) 

V b ti) 


2 


• (19) 


However, one should keep in mind that while dealing with highly 
neutral regions (which is relevant for the late reionization scenario), 
the Gaussian approximation for the line profile is not valid, and 
one has to use the appropriate form for the Voigt profile. In regions 
away from the center, the profile is given by the Lorentzian form: 


+<*[«#(«) - v(j)] 



- v{j)) 2 + R 2 ] 


( 20 ) 


where R a = AaXa/Lrv with A a being the decay constant for the 
Lya resonance and A a is the wavelength of the Lya line. 

Similar expressions follow for Ly/3 absorption lines too. In 
particular equation (18) is replaced by 


TLyf){i) 


clp 


A* 

1 + 2 


-^pix 

^2 nm(j)®p[v H (i) 
3= 1 


v{j)] 


( 21 ) 


where 1,3 = (fhypXLyp)/ (fhyaAhya)Ia, with 

(fhyp^hy 0 )/{fhyaXLya) = 0.16 being the ratio of the product 
between the oscillator strength and the resonant scattering wave¬ 
length for Ly f3 and Lya; $p is the Voigt profile for Ly f3 transi¬ 
tion. For low column density systems, has the Gaussian form 
and it is essentially the same as 4? a . Hence, for such systems, we 
haveTL y / 3 (/) = 0.16rL ya (*). However, the situation is different for 
highly neutral regions, where depends on the decay constant for 
the resonance line and thus is different from <b Q . In particular, for 
regions away from the center, the profile is of the Lorentzian form: 


d?/3 [»«(•) 


v(j)] = 


_ Rp _ 

n[(v H (i) - v(j)) 2 + Rj] 


( 22 ) 


where Rp = ApXp/4-K. 


2.1.5 Simulation of observational and instrumental effects 

In this work, we compute various statistical quantities related to the 
Lya forest using our simulated spectra, such as (i) the evolution of 
the Gunn-Peterson optical depth (tgp), (ii) the Probability Distri¬ 
bution Function (PDF) and (iii) the Dark Gap Width Distribution 
(DGWD), which can then be compared with observational results. 
To make sure that the simulated spectra contain the same artifacts 
as the observed ones, we take into account the broadening of lines 
due to instrumental profile, the pixel size and noise. In this regard, 
we first convolve each simulated spectra with a Gaussian with a 
Full Width at Half Maximum (FWHM) corresponding to the reso¬ 
lution of the instrument used for observations. We than re-sample 
each line to varying pixel size. We finally add noise to the simulated 
Lya forest spectra corresponding to the observed data in a manner 
that the flux F in each pixel is replaced by F —> F + cr n0 iseG(l), 
where <7 no i ae = 0.02 and G(l) is a Gaussian random deviate with 
zero mean and unit variance. We have also studied the effect of 
varying the FWHM, the pixel size and CT no ise on different statistical 
quantities and shall comment on them wherever appropriate. 

2.2 Reionization models 


where Aa: is the comoving pixel size, v(i) = vh{i) + u pec (i) is As it is clear from the above discussion, the simulation of the 

the total velocity in the pixel i and 4?^ is the Voigt profile for Lya Lya forest spectra requires the knowledge of a few free parame- 
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ters. For the background cosmological model and the CDM power 
spectrum, we use the best-fit values given by WMAP experiment 1 . 
This leaves us with the parameters related to the IGM, which are 
T m (z), To(z), THi(a;, z) and 7 ( 2 ). One approach could be to treat 
them as free parameters and try to constrain them by comparison 
with observations. Fiowever, the evolution of all the above param¬ 
eters depends on the detailed ionization and thermal history of the 
IGM and can be quite complex - hence constraining the parame¬ 
ters over a wide redshift range is not straightforward. The other ap¬ 
proach is to use a self-consistent model for thermal and ionization 
history of the universe and calculate the globally-averaged values 
of the above quantities. 

In this paper, we take the second approach and use the semi- 
analytical model of CF05 to obtain the globally-averaged values 
of Tm, To, Fhi at different redshifts. The model implements most 
of the relevant physics governing the thermal and ionization his¬ 
tory of the IGM, such as the inhomogeneous IGM density distri¬ 
bution, three different classes of ionizing photon sources (massive 
PopIII stars, PopII stars and QSOs), and radiative feedback inhibit¬ 
ing star formation in low-mass galaxies. The main advantage of 
the model is that its parameters can be constrained quite well by 
comparing its predictions with various observational data, namely, 
the redshift evolution of Lyman-limit absoiption systems (Storrie- 
Lombardi et al. 1994), Gunn-Peterson (Songaila 2004) and elec¬ 
tron scattering optical depths (Kogut et al. 2003), temperature of 
the IGM (Schaye et al. 1999) and cosmic star formation history 
(Nagamine et al. 2004). 

According to the above model, the redshift of reionization is 
identified with the onset of the post-overlap stage (Gnedin 2000), 
which is defined as the epoch where the volume filling factor of 
ionized hydrogen in low-density regions (with overdensities less 
than a few tens) reaches unity (Qhii = 1). Following this, the 
ionized regions start propagating into the neutral high density re¬ 
gions, which is manifested as the evolution in the specific number 
of Lyman-limit systems. In what follows we will consider two dif¬ 
ferent reionization scenarios: 

Early Reionization Model (ERM) 

This model, which refers to the fiducial model described in CF05, is 
characterized by an highly ionized IGM at redshifts 2 < 14. In this 
scenario, an early population of massive metal-free (PopIII) stars 
ionize hydrogen at high redshifts, thus producing the high electron 
scattering optical depth observed by WMAP. The PopIII stars start 
disappearing at ztrans ~ 10; at lower redshifts, PopII stars and 
QSOs contribute to the ionizing background with a large number 
of photons with energies above 13.6 eV which are able to main¬ 
tain the ionized state of hydrogen. In this model we assume the 
photoionization rate to be spatially homogeneous and equal to the 
globally averaged value, i.e., Fhi (a;, 2) = Fhi (2) 

In the context of the present work, where we are concerned 
with state of the IGM at 5.5 < 2 < 6.5, the ERM corresponds to 
a highly ionized IGM at these redshifts. While it is true that this 
model can explain a large number of observational constraints, it is 
in contradiction with the analyses predicting that the IGM could be 
in a highly neutral state at 2 > 6 (Wyithe, Loeb, & Carilli 2005; 


1 Throughout this paper we will assume a flat universe with total matter, 
vacuum, and baryonic densities in units of the critical density of f2 m = 
0.27, Ha = 0.73, and fl^h 2 = 0.024, respectively, and a Hubble con¬ 
stant of Ho = 100// km s -1 Mpc —1 , with h = 0.72. The parameters 
defining the linear dark matter power spectrum are erg = 0.9, n = 0.99. 
dn/d In k = 0. 


Wyithe & Loeb 2004; Mesinger & Haiman 2004). Hence it be¬ 
comes necessary to consider an alternative model for reionization 
where the IGM is predominantly neutral at z > 6 . 

Late Reionization Model (LRM) 

The main motivation to consider this model is to verify whether 
the Lya forest can still be used to determine the ionization state 
of the IGM at z > 6 . In this model, the hydrogen distribution 
in the low-density IGM is characterized by two distinct phases 
at z > 6 , namely an ionized (HII) phase with a volume filling 
factor Qhii and a neutral (HI) phase with a volume filling factor 
1 — Qhii, with the evolution of Qhii and other physical parame¬ 
ters, [T m (z), To(z), Thi (2)] being calculated self-consistently us¬ 
ing the model of CF05. To achieve this two-phased state, we con¬ 
sider an ionizing background different from ERM one. In fact, the 
main (and only) difference between ERM and LRM is that the 
PopIII stars do not play an efficient role for reionization in LRM 
and as a consequence the IGM remains neutral up to redshift 6 , un¬ 
til the contribution of PopII stars to the UV background starts be¬ 
coming substantial. In passing, we should mention that in the ERM 
the electron scattering optical depth is r e = 0.17, in perfect agree¬ 
ment with the high value measured by WMAP, while in the LRM 
it is T e = 0.06, value which is lower than the 2 o limit allowed by 
WMAP. 

Model comparison 

To compare the global properties of the two reionization models, 
we plot the evolution of the volume filling factor of ionized regions 
Qhii(z) in the left panel of Figure 1. It is clear that the two models 
differ only at z > 6; for the LRM Qhii evolves from 0.7 to unity 
in the redshift range 6 . 6 - 6 . 0 , implying that the universe is in the 
pre-overlap stage, while for the ERM, Qhii = 1 at these epochs. 

We next compare the evolution of the globally volume- 
averaged photoionization rate for the two models in the middle 
panel of Figure 1. At z < 6 the ionizing sources are mainly PopII 
stars and QSOs for both models and so Fhi is comparable in the 
two models. At z > 6 , there are no contributions from PopIII 
stars to the UV background radiation in the LRM, which are in¬ 
stead present in the ERM. As a consequence, at high redshift, Fhi 
is higher in the ERM with respect to the late reionization one. The 
photoionization rate evolution is also compared with the results ob¬ 
tained by F02 and Bolton et al. (2005), hereafter B05. The agree¬ 
ment is quite good with B05 data at z < 4; the mild discrepancy 
at z « 4 could be attributed to the systematic overestimation of 
the photoionization rate due to the limited box size and resolution 
of their hydrodynamical simulations. On the other hand, the ERM 
mildly violates the upper limit of Fhi = 0.08 at z = 6.0 obtained 
by F02. However F02 use Fhi as a free parameter (same as in B05) 
to match the mean transmitted flux at high redshift, whose value 
has a large uncertainty (see Section 3.1.1). So we expect that also 
the values of Thi at z > 4 are associated with uncertainties at 
least as large as for the estimates at lower redshifts. This alleviates 
the mild discrepancy between the ERM and the upper limit on Fhi 
suggested by F 02 at z = 6 . 

The globally volume averaged neutral hydrogen fraction /hi 
is shown in the right panel of Figure 1. The evolution of /hi with 
redshift has been obtained computing 10 LOS for eight redshift 
bins (Az = 0.4) covering the redshift interval 2.8-6.7. Figure 1 
shows that, as expected, the IGM is highly ionized for the ERM, 
while it is quite neutral at z > 6 for the LRM. Furthermore, and 
opposite to the ERM case, we find a sharp evolution in the neutral 
fraction around 5.5 < z < 6.5 for the LRM when overlapping oc¬ 
curs in that model. Our prediction on the neutral hydrogen fraction 
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Figure 1. Evolution of the volume filling factor of ionized regions (left), the globally volume-averaged photoionization rate in units of 10' - 12 s-\r _ 12 = 
Fhi/ 10 - 12 s —1 (middle) and the neutral hydrogen fraction (right) for the early (ERM) and late (LRM) reionization models. The points in the middle panel 
show results obtained by Bolton et al.(2005) (B05, filled squares) and Fan et al.(2002) (F02, filled triangles) using hydrodynamical and N-body simulations, 
respectively. In the right panel, thick lines represent average results over 10 LOS, while the thin lines denote the upper and lower neutral hydrogen fraction 
extremes in each redshift interval. 
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Figure 2. Temperature evolution of the neutral regions. The solid line refers 
to the model with X-ray heating described in the text; the dotted line de¬ 
scribes the temperature evolution assuming adiabatic cooling. 


at z = 6, considering cosmic variance, is in agreement with the re¬ 
sult obtained by F02, while there is a discrepancy with the measure 
of /hi arising from the analyses of the HII regions (Wyithe, Loeb, 
& Carilli 2005; Wyithe & Loeb 2004; Mesinger & Haiman 2004) 
as discussed in the Introduction. 


to relate it to the globally volume-averaged photoionization rate 
r H i(z) is through the relation: r™( 2 ) = F H i(2)/Qhii(2). 

Also to be noted is that in absence of ionizing radiation, the 
temperature To of the mean density neutral gas will decrease adia- 
batically and can be as low as ~ 1 K at z « 6 ; however, the pres¬ 
ence of a population of hard photons (say, soft X-ray photons from 
QSOs, X-ray binaries or supernova remnants), which are able to 
penetrate the atomic medium, can raise To for these neutral regions 
(Chen & Miralda-Escude 2004). To investigate this possibility we 
study the evolution of To for neutral regions in the expanding Uni¬ 
verse, integrating the following equation: 


n i -A^ 0 ot> 2 Ttot Atot 
{1 + Z) ^- 2T °~3 H(z)n b k B 


(23) 


where r tot and A tot are the total heating and cooling rates, respec¬ 
tively. Assuming that the main heating mechanism is due to soft 
X-ray photons, Ttot can be substituted by Tx, where Tx is the 
X-ray heating rate. Following Chen & Miralda-Escude (2004), we 
parametrize the emissivity in X-rays in terms of the fraction of en¬ 
ergy that is emitted in X-rays compared to the energy emitted at 
Lya per unit log u, which we designate as ax - The X-ray heating 
rate is then given by the following relation: 


T x {z) = 0.14 a x h P v a e(z ) 


(24) 


where 0.14 is the fraction of the X-ray energy used to heat the gas 
(Shull & van Steenberg 1985), hp is the Planck constant, v a is the 
frequency of the Lya transition and e is the comoving Lya emis¬ 
sivity as obtained in the LRM. 

We start solving the differential equation (24) from z s tart = 
30 assuming that, at this redshift e(z) « 0. In the absence of any 
heating sources, the temperature of the gas can be shown to be ~ 
11K. 2 During the redshift range covered in the calculation, the 


2.3 Additional physics 

In the two-phase model, the photoionization rate T hi is nearly zero 
in the neutral HI regions, as most of the points are opaque to ioniz¬ 
ing radiation. On the other hand, the photoionization rate Th! 1 ^) 
inside HII regions is assumed to be homogeneous; a natural way 


2 The Compton scattering between the CMB photons and relic free elec¬ 
trons from cosmic recombination couples the cosmic gas temperature to the 
CMB one, down to redshift 1 + Zf ~ 1000(f2;,/i 2 ) 2 / 5 ((Peebles 1993)). 
Following that the temperature of the gas cools down adiabatically. So the 
gas temperature before the formation of any heating source is given by 
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temperature of the gas is low enough to neglect cooling as the cool¬ 
ing function is different from zero only for temperatures above 10 4 
K. 

Figure (2) shows the thermal evolution for neutral regions 
when we include X-ray heating with ax = 0.01. Assuming that 
only 1 per cent of the energy is emitted as X-rays, the temperature 
of the gas is equal to 0.76 K at z = 6. Even when we use a higher 
value of ax = 0.1, the temperature raises only to 2.8 K at z = 6. 

Fortunately for this work, it turns out that all the results are 
independent of the precise value of To for neutral regions as long 
as it is below 1500 K, with variations being less than the typical 
statistical variance. The reason for the insensitivity of our results 
to To can be understood in the following way: the value of To has 
two possible effects on the simulated spectra. The first effect is to 
determine the recombination rate of the ionized species and thus 
affect the neutral hydrogen fraction. However, the low ionization 
rates in the neutral regions imply very small ionized fraction and 
thus a very long recombination time, thus making the value of To 
irrelevant for calculating the neutral fraction. The second effect of 
To is to determine the widths of the lines through the Doppler pro¬ 
file. But here again, because of the large neutral fractions, the line 
profile is predominantly determined by the natural width (which 
shall be discussed in detail later) and thus the effect of To is again 
negligible. For the rest of the paper, we shall assume that the tem¬ 
perature To for neutral regions is IK. 

There are few more subtleties which need to be addressed 
while dealing with neutral regions along lines of sight. First, the 
volume filling factor Qmi(z) applies to three dimensional regions 
only, and hence one needs to translate this into a one dimensional 
filling factor qan(z) along different lines of sight in a consistent 
manner which takes into account the evolution in Qhii. This is a 
purely geometrical exercise and can be performed if one knows 
the geometry of the neutral regions. However, the value of the fill¬ 
ing factor Qhii does not uniquely determine the size and shape 
of the neutral regions; the detailed topology depends on the na¬ 
ture of sources, coupled with the density distribution of the IGM, 
and hence is non-trivial to take it into account analytically. On the 
other hand, numerical simulations still do not have enough dynamic 
range to address this issue for wide regions of parameter space. 
Given this, we devise an approximate method, described below, to 
calculate qmi(z) along different lines of sight. In addition, we also 
study different variations of our method accounting for different 
topologies of the neutral regions, and check whether our main con¬ 
clusions remain unchanged. 

The simplest method of distributing the neutral regions along 
different lines of sight is based on the assumption that the positions 
of the neutral regions in the three-dimensions are completely ran¬ 
dom and the regions are not correlated with the density field. This 
assumption seems to be quite reasonable at late stages of reioniza¬ 
tion as found in radiative transfer simulations, where most of the 
individual ionized regions have overlapped leaving neutral regions 
of random shapes and sizes with no significant clustering pattern. 
(For a visual impression see, for example, the maps in Ciardi, Fer¬ 
rara, & White 2003.) Of course, very high density regions, like col¬ 
lapsed structures or filaments, tend to remain neutral till late times, 
thus correlating the neutral regions with density field. However, 
these high density regions are not significant for the Lya forest, and 
hence can be ignored in our analysis. Nevertheless, we do check the 

To(zstart) = 7cmb(1 + Zf) [(1 + Zstart)/(1 + Zf)] 2 , where Tomb 
is the temperature of the CMB at z = 0. 


effects of clustering of neutral regions of large sizes and their cor¬ 
relation with the density field in Section 3.7. In addition we present 
the technical details of distributing the neutral regions along lines 
of sight and their physical properties in Appendix B. 


3 RESULTS 

In this Section, we present the main results of our analysis. In the 
first part, we shall put our model to test by comparing it with vari¬ 
ous available observations at z < 6. As we shall see, the model is 
quite successful in matching the observational data. Next, we shall 
discuss the predictions of the model at z > 6 in the second part, 
and then try to determine the ionization state of the IGM. 

3.1 Comparison of the model with observations at z < 6 

3.1.1 Gunn-Peterson optical depth ( tgp ) 

The first obvious test for our model would be to check whether 
it can match the mean GP opacity of the Lya and Ly (3 forests at 
z < 6. For this purpose, we use the data from S04, and thus the 
procedure for obtaining the mean transmitted flux from simulated 
spectra is similar to that work. We consider the emission redshifts 
of the 50 QSOs observed (2.31 < z em < 6.39) as mentioned 
in Tables 1 and 2 in S04. For each emission redshift, we simulate 
the Lya absorption spectra covering the wavelength range 1080 - 
1185 A. We then divide each spectra in seven parts of length 15 
A and compute the Lya Mean Transmitted Flux (MTF) for each 
part. These data points are then binned in a way such that each bin 
contains six points. For each bin, the mean transmission and the 
extremes of transmission have been computed and then assigned to 
the median redshift within the bin. The GP optical depth is defined 
as: tgp = — ln(MTF). The GP optical depth evolution for Lya 
transition is plotted in Figure 3 (left panel) as a function of the 
median redshift in each bin. The vertical error bars show the range 
of extremes of transmission within each bin, weighted on 100 LOS, 
translated to optical depth. 

In order to compare our results with observational ones (S04), 
each spectra have been convolved with a Gaussian whose FWHM 
is equal to 8 km s _1 , if the emission redshift z em is below 4, or 
equal to 56 km s _1 , if z em > 4. This procedure implies, in the first 
case, a resolution around 36000, since the observed spectra have 
been taken with the HIRES spectrograph on the Keck I telescope, 
while in the second case the mimicked resolution is 5300, the same 
of the ESI spectrograph on the Keck II telescope. The pixel size of 
the rebinning is 12 km s _1 . 

Extending the above procedure for Ly fi region of the absorp¬ 
tion spectra (corresponding to the rest wavelength range 980 - 1010 
A), we derive the total optical depth at a given redshift z from the 
sum of the direct Ly j3 absorption at that redshift and the Lya ab¬ 
sorption at redshift 1 + zp — + z), i.e., 

rtfp(z) = TLyp(z) + T Lya (z 0 ), (25) 

where TL ya and Tt, y p are given by equations (18) and (21) respec¬ 
tively. For each emission redshift, the absorption spectra in the 
wavelength range 980 - 1010 A is divided in two parts of 30 A 
each. As in the Lya, we obtain the evolution of Ly/3 MTF and the 
range of extreme values by binning the data. Note that, in order to 
calculate the Ly (3 flux distribution in the rest wavelength range 980 
- 1010 A (as discussed above), we need to estimate the Lya optical 
depth in the interval 827 - 1010 A [this follows trivially from the 
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Figure 3. Ly a (left panel) and Ly f) (right panel) GP optical depth compared with data from Songaila (2004). Solid (red) and dotted (blue) thick lines represent 
average results for ERM and LRM, respectively, on 100 LOS for each emission redshift; the thin lines denote the upper and lower transmission extremes in 
each bin, weighted on 100 LOS. 


expression (25) for Ly/3 optical depth]. The evolution of the MTF 
for Ly (3 is plotted in Figure 3 (right panel). The error bars show 
the range of extremes of transmission within each bin, translated to 
optical depth. 

We find that both the Lya and the Ly j3 MTFs are in excellent 
agreement 3 with observations at 3 < z < 6 . Though it might seem 
from Figure 3 that it is difficult to reach an optical depth as high as 
S04 at z = 6 with our models, it must be noted that our results are 
weighed over a large number of realizations. In fact, we find that 
there are lot of realizations which give very high optical depth at 
a = 6 . in complete agreement with S04. 

Note that in the MTF analysis of the Lya forest, the slope of 
the equation of state 7 — 1 has been treated as a (non-evolving) 
free parameter which has the best-fit value 7 = 1.3. Once that it is 
fixed in order to match the Ly a data, no any other free parameter 
can be tuned to obtain a good trend of Ly (3 MTF. The agreement 
of the MTFs in both the Lya and Ly/3 regions is thus an indirect 
confirmation that the lognormal model for density fields could be 
considered as a fair description of the mildly non linear regime. 

3.1.2 Probability Distribution Function (PDF) 

In this Section, we compute the Probability Distribution Function 
of the transmitted flux for the Lya forest and compare it with the 
observed Keck spectra of SDSS 1044-0125, SDSS 1306+0356, and 
SDSS 1030+0524 within redshift range 5.5 < z < 6.0 (F02). 
In order to compare with observations, we add the relevant ob¬ 
servational artifacts in our simulated spectra, i.e., we smooth the 
simulated spectra with a Gaussian filter of smoothing length a v = 
28 km s _1 (corresponding at a FWHM of 66 km s _1 ) and bin them 

3 The fact that there is hardly any difference between the early and late 
reionization models is related to the fact that the two models differ substan¬ 
tially only at redshifts above 6. 


in pixels of width 35 km s _1 . We then add noise (see Section 2.1.5) 
to the simulated Lya forest spectra corresponding to the observed 
data with cr no i Se = 0.02. Figure 4 shows the PDF of the transmit¬ 
ted flux, computed using 500 random realizations of the artificial 
spectra at the mean redshift 5.5, 5.7, 6.0 respectively. 

The flux PDF of the simulated spectra is consistent with the 
observational ones in all three redshift cases. Furthermore, as ex¬ 
pected, the agreement is quite good for both reionization models. 
Note that the difference in the photoionization rates of the two mod¬ 
els are typically 18%, 18% and 39% for redshifts 5.5, 5.7, 6.0 re¬ 
spectively (see middle panel of Figure 1); yet the two models seem 
to be indistinguishable. This implies that the PDF is not sensitive 
enough in discriminating between different evolution histories of 
the ionizing background. 

It is worth briefly mentioning here that F02 have used numer¬ 
ical simulations to compute the absorption spectra of high-redshift 
quasars in the Lya region and have found a rapid evolution of 
the volume-averaged neutral fraction of hydrogen at a < 6 (from 
/hi ~ 10 ~ 5 at a = 3 to /hi ~ 10 -3 at z = 6). This evolution 
has been interpreted as a signature of the end of reionization around 
z ~ 6 . However, we find that, in addition to late reionization mod¬ 
els (where /hi is evolving rapidly at z ss 6), early reionization 
models also give a good fit to the observed MTF and PDF at z < 6 . 
It is thus not possible to rule out early reionization models using 
only MTF and PDF statistics at z < 6 . 


3.1.3 Dark Gap Width Distribution (DGWD) 

At high redshifts, regions with high transmission in the Lya forest 
become rare. Therefore an alternative method to analyze the sta¬ 
tistical properties of the transmitted flux is the distribution of dark 
gaps first suggested by Croft (1998), defined as contiguous regions 
of the spectrum having an optical depth > than 2.5 over rest frame 
wavelength intervals greater than 1 A. In this Section we will com- 




























Constraining the reionization history with QSO absorption spectra 9 



0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 

flux flux flux 


Figure 4. Probability distribution function of the transmitted flux at mean redshift 5.5. 5.7 and 6.0, respectively, compared with F02 Keck data. Solid (red) and 
dotted (blue) lines represent the ERM and the LRM, respectively. For each model, the thick line is the average over 500 LOS, the thin lines denote the cosmic 
variance. 


pare our results with observational data obtained by Songaila & 
Cowie (2002), hereafter SC02, analyzing 15 high-redshift QSOs 
whose emission redshifts lie in 4.42 < z < 5.75. In order to ob¬ 
tain a fair comparison with data, each simulated spectrum has been 
convolved with a Gaussian having FWHM equal to 60 km s -1 , re¬ 
sulting in a spatial resolution similar to the data obtained using ESI 
spectrograph. In Figure 5 we plot typical simulated line of sight 
spectra at redshifts 4.61 and 5.74. The black lines plotted imme¬ 
diately below the spectra show the regions identified as gaps in 
the Lya region and the ones below those show the gaps present 
in the Ly/3 region too. This Figure should be compared with Fig 2 
of SC02, showing good qualitative agreement between our results 
and observations. It is also clear from the Figure that the frequency 
and the width of the gaps increase from redshift 4.61 to redshift 
5.74. 

Figure 6 shows the DGWD for the redshift range 3.0 < 
Zabs < 5.5 (where z a b s is the redshift of the absorber), obtained 
from a sample of 300 LOS for each redshift bin. The distribution 
essentially measures the number of gaps having a certain width W a 
in the QSO rest frame within the Lya region. The results, as well as 
the statistical errors, obtained from our models are in good agree¬ 
ment with observational data over a wide redshift range. One can 
also see that the frequency of larger gaps increases as we go to 
higher redshifts. Comparing our results with hydrodynamical sim¬ 
ulations (Paschos & Norman 2005) we find that our results are in 
better agreement with observations. This is probably due to their 
limited box size (6.8 comoving /i _1 Mpc); in this case to simulate 
spectra covering a large redshift range (5.0-5.5, for instance) each 
line of sight has to cross the simulated volume more than once. So 
the presence of an under-ionized region could break a long dark 
gap in smaller ones. The advantages of using semi-analytical sim¬ 
ulation is that we can obtain the same length of the observed spec¬ 
trum (e.g. 100 A to compare with SC02) in one realization, instead 
of combining together various artificial spectra end to end. 

So far we have tested our models against the observational 
data available at z < 6. We have seen that both late and early reion¬ 
ization models are able to match (i) the MTF evolution both in the 
Lya or Ly/3 regions, (ii) the PDF of the transmitted flux and finally 


(iii) the DGWD. We can thus conclude that the results obtained at 
a < 6 do not allow to exclude the possibility that the universe has 
been reionized as early as at redshift 14. However, we have already 
discussed the fact that the difference between the two reionization 
scenarios are most substantial only at a > 6. It is thus important 
to see how the Lya forest at a > 6 can be used for distinguishing 
between the two different scenarios. This is what will be done in 
the next Section. 

3.2 Predictions for higher redshifts 

Let us now extend the analyses of the previous Section to spectra at 
z > 6 and try to determine whether the Lya forest is able to distin¬ 
guish between different models of reionization. Since the spectra 
are generally expected to be much darker at these high redshifts, it 
is clear that the MTF would not be able to distinguish between the 
two models (it is consistent with zero irrespective of the ionization 
state of the IGM). Hence, we start our discussion with the PDF of 
the transmitted flux for the Lya forest. 

3.2.1 Probability distribution function (PDF) 

As in the Section 3.1.2, we compute the PDF of the transmitted 
flux for 500 random LOS, with the corresponding cosmic variance. 
The results are shown in Figure 7 for mean redshifts of 6.25 and 
6.5, respectively. Interestingly, we find that even at redshifts higher 
than 6 (where the ERM and LRM differ quite substantially in their 
physical properties), the PDF is not able to differentiate between 
the two reionization scenarios mainly because of large cosmic vari¬ 
ance. This has to do with the fact that most of the pixels have flux 
consistent with zero for both reionization models and so the PDF 
essentially probes the noise distribution (which is independent of 
the physical state of the IGM). However, note that in the LRM we 
have no pixels with F > 0.8 at z — 6.25 and with F > 0.7 at 
z = 6.5 respectively, while there are pixels (though very few) with 
F as high as 0.85 at z = 6.25 and 0.8 at z — 6.5 for the ERM 
(these correspond to some peaks present in the ERM which are 
suppressed in the LRM). Whether this can discriminate between 
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Figure 5. Two simulated QSO spectra with emission redshift 4.61 (top panel) and 5.74 (bottom panel). The broken lines immediately below the spectra 
represents dark gaps in the Lya region and the ones below those show the analogous gaps in the Ly/3 region. The horizontal solid lines slightly above zero 
transmission represent the flux threshold (= 0.082) used for defining gaps. 


the two models is doubtful particularly because of the uncertainties 
in the continuum of the unabsorbed quasar spectra and the effects 
arising front atmospheric absorption. 

3.2.2 Dark Gap Width Distribution (DGWD) at z > 5.5 

The next statistics which can be used is the DGWD for the Lyo? for¬ 
est at redshifts higher then 5.5. For definiteness, we consider two 
redshift intervals: 5.7 - 6.3 and 6.0 - 6.6. The first case should be 
applicable to QSOs having emission redshift around 6.4, while the 
second case corresponds to an emission redshift « 6.7. In Figure 
8 , we plot sample spectra for the two different reionization models 
in the redshift range 5.7 - 6.3, with the upper (lower) panel cor¬ 
responding to ERM (LRM). The black lines plotted immediately 
below the spectra show the regions identified as gaps. Note that 


the two spectra have the same baryonic density distribution and 
differ only in the distribution of neutral regions. It is clear that at 
large values of rest frame wavelengths (A rf), say, Arf > 1150 
A (which corresponds to z > 6), there are substantial differences 
between the ERM and LRM. The LRM does not have the peaks 
at large redshifts which are present in the ERM, and hence one 
obtains gaps of much larger widths for the LRM. The reason for 
the suppression of the peaks in the LRM is twofold. Firstly there 
is an increase in the optical depth at the pixels where neutral re¬ 
gions are placed, thus decreasing the transmission. However, there 
is a second effect which seems to be more important which has to 
do with the damping profile of neutral hydrogen arising from nat¬ 
ural line width. This effect can suppress flux in regions which are 
not necessarily neutral but lie in the vicinity of a highly neutral re¬ 
gion. This can be understood from a close-up of the spectra shown 
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Figure 6. Dark Gap Width Distribution (DGWD) at redshift 3.5-5.5 compared with Songaila & Cowie (2002), denoted SC02 in the panels. Solid (red) and 
dotted (blue) lines represent the ERM and the LRM, respectively. For each model, the thick line is the average over 300 LOS, the thin lines denote cosmic 
variance. 


in Figure 9 where we have zoomed into a region between 1191 A 
< Arf < 1194 A. As before the upper (lower) panel corresponds 
to ERM (LRM) and the black lines plotted immediately below the 
spectra show the regions identified as gaps. We have also shown the 
positions of the neutral pixels by crosses within the Figure. Note 
that there are two prominent peaks in the ERM at Arf = 1192.2 
A and 1193.4 A respectively while they are completely suppressed 
in the LRM. However, note that there are no neutral pixels at the 
location of the peaks - they are actually suppressed by the damping 
wings of a small number of neutral pixels in the vicinity. Thus the 
damping wing of neutral regions can have a dramatic effect on the 
distribution of dark gap widths as shall be discussed next. 

The results for the distribution of dark gap widths for the two 
redshift ranges are plotted in Figure 10. We find that the dark gap 
width distributions for ERM and LRM are essentially the same (ac¬ 


counting for the cosmic variance) for W a < 40 A, where W a de¬ 
notes the gap width in the Lya forest. However, one should note 
that for larger W a the frequency of gaps differs substantially for 
different models, with the difference being quite obvious for the 
redshift range 6.0 - 6.6. As expected, LRM predicts an higher prob¬ 
ability to find larger gaps because of the neutral regions in the IGM. 
This difference in the two models at large gap widths can be used as 
a possible discriminator between early and late reionization. How¬ 
ever, it turns out that it is possible to devise a more sensitive statis¬ 
tics for this purpose which we discuss in the next subsection. 

3.3 Distribution of largest gaps 

In this Section, we present the most sensitive diagnostic for dis¬ 
tinguishing between different reionization scenarios using the Lya 









































12 


Gallerani, Choudhury & Ferrara 


n 

Dh 



flux flux 


Figure 7. Same as in Figure 4 but at mean redshifts 6.25 and 6.5. 


forest. We calculate the width of the largest gap for each 

of the 300 LOS generated from our models, and then compute the 
fraction of LOS having a particular value of the largest gap width. 
It is clear from the discussion on Figure 8 that the typical size of the 
largest gap along a LOS will be much larger in the LRM compared 
to ERM. The fraction of LOS having a given value of largest gap 
is shown in Figure 11 with the corresponding cosmic variance. It is 
clear that the distributions for the two models differ substantially; 
in particular, one should find gaps with W a > 50 A for the LRM 
along ~ 35 per cent of the lines of sight, while if the universe is 
ionized early, there should be no line of sight with a dark gap width 
> 50 A. This is a very stringent result, and can be used to rule out 
the early reionization scenario from observational data. Similarly, 
the absence of dark gap widths > 50 A can be used to rule out late 
reionization scenarios. 

As expected, the difference between the two reionization mod¬ 
els is more drastic in the highest redshift range 6.0 - 6.6. In partic¬ 
ular, we expect nearly half the lines of sight to have a gap of width 
as large as 80 A if the universe is in the pre-overlap stage, while no 
such line of sight should be observed if the IGM is ionized. Even 
if we take the statistical errors into account, we find that in order to 
validate the late reionization hypothesis, at least 40 per cent of the 
lines of sight should have dark gaps larger than 70 A in the redshift 
range 6.0 - 6.6. 

At present SDSS has already observed 9 QSOs above redshift 
6 , and thus one should be able to compute the distribution of the 
largest gap widths. As an example, a visual inspection of the spectra 
of QSO SDSS J1030+0524 (which shows the darkest GP trough till 
date) reveals that the size of the largest dark gap is < 40 A, which 
can be well explained both by ERM and LRM. 

3.4 Peak Width Distribution (PWD) 

Having identified a very useful statistics, the DGWD, we now in¬ 
troduce another possible analysis which can be thought as comple¬ 
mentary to the gap statistics. The Peak Width Distribution (PWD) 


allows us to measure the frequency and the width of those regions 
of the spectra characterized by a high transmission, i.e., a flux be¬ 
tween 0.08 and 0.8 over rest frame wavelength intervals greater 
than 0.2 A (which is roughly equal to 6 pixels of our rebinned spec¬ 
tra). Visually these regions would appear as isolated spikes in the 
spectra at high redshifts (in this sense the terms “spike” and “gap” 
can thought of as equivalent). The lower threshold flux is same as 
the one used as the upper threshold in the DGWD analysis, while 
the upper limit value of 0.8 have been chosen in order to avoid re¬ 
gions of full transmission which could be affected by atmospheric 
absorption (though this effect does not seem to have much effect on 
the statistics). Figure 12 shows the results for the PWD in the red¬ 
shift ranges 5.7 - 6.3 and 6.0 - 6.6, respectively with P a denoting 
the width of the peak. Similar to the DGWD, it is clear that for small 
peak widths, the errors are too large and thus do not allow to use 
this statistics for discriminating between the two models; however 
the distributions are quite different for peak widths larger than 1.2 
A. This suggests that the fraction of lines of sight having a largest 
peak width of a given value could be used as another discriminating 
statistics between the two models. 


The results for the fraction of lines of sight with a given value 
of the largest peak width are plotted in Figure 13. It seems 

that at z = 5.7—6.3, the probability of finding a line of sight having 
a peak width larger than 1.2 A is negligible in the LRM, while in 
an ERM peaks of this size seem to be present for 20 per cent of the 
lines of sight. The same effect is more evident in the redshift range 
6.0 — 6.6: there ERM predicts peaks of width ~ lA in 40 per cent 
of the lines of sight; on the contrary, the LRM predicts no peaks 
larger than 0.8 A. 


We believe that the distribution of peak widths can be used in 
a complementary way with the dark gap width statistics for con¬ 
straining the ionization state of the IGM at z > 6. 
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Figure 8. Comparison between two simulated spectra for different reionization scenarios, ERM (top panel) and LRM (bottom panel), in the redshift range 
5.7-6.3. Both these spectra are obtained from the same density distribution. 


3.5 Results for the Ly/3 region 

In addition to the Lya forest, one can also use the Ly/3 region of 
the absorption spectra to constrain the ionization state of the IGM 
at high redshifts. The advantage of using the Ly/3 absorption lines 
is that the absorption cross section is lower than the Lya one, and 
hence one finds some features of transmission within the spectra in 
Ly/3 region even when Lya transmission is zero. In this Section, we 
present our predictions for the Ly/3 forest at z > 6 . 

We have calculated the DGWD for the Ly/3 forest in the red- 
shift ranges of interest and found it to be quite similar to the Lya 
case. The distribution does show some differences between the 
reionization models at high values of gap widths, though the dif¬ 
ference is not as evident as in the case of Lya. We plot the fraction 
of LOS having a largest gap width of a given value in Figure 14, 
which corresponds to Figure 11 for Lya. Though the ERM and the 


LRM differ in their distributions for Ly/3 regions, we find that it is 
not as discriminating as in the case of Lya in the redshift intervals 
considered here. 

We have also computed the PWD distribution for the Ly/3 re¬ 
gion of the spectra. The broad conclusions are similar to those ob¬ 
tained from Lya regions, though the discrimination between LRM 
and ERM is reduced in the case of Ly/3. However, the usefulness 
Ly/3 statistics lies in the fact that these can be used as an indepen¬ 
dent check for the reionization models. 

3.6 Dark gaps in both Lya and Ly/3 regions 

In this Section we study the presence of dark gaps in both 
the Lya and the Ly/3 regions of the absorption spectra. In Fig¬ 
ure 15 we show the mean Lya against the mean Ly/3 opti¬ 
cal depth for different dark gap lengths, for both reionization 
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Figure 9. Same as in Figure 8, but zoomed into the spectral region 1191 A < Arp < 1194 A. The black lines plotted immediately below the spectra show the 
regions identified as gaps. We have also shown the positions of the neutral pixels by crosses. Even if neutral pixels are present only in the LRM, we draw them 
also in the ERM spectrum, in order to visualize their location. It is evident that there is suppression of the flux in the LRM even if there are no corresponding 
neutral pixels. This result, as discussed in detail in the text, is due to the damping wings of the neighboring neutral pixels. 
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models. Points and triangles in the figure represent dark gaps 
such that 0 < log 10 [Max{ITa,, Wp}] < 0.5 and 1.5 < 
log^pViaxlWa, Wp}] < 2, respectively, where Max{W a , Wp} 
is the width of the larger dark gap between the Lya and Ly/3. It is 
obvious from the figure (and also stressed by Paschos & Norman 
2005) that larger dark gaps correspond to higher optical depths in 
both models. 

There is one more interesting point to be noted from the figure. 
Conventionally the total Ly/3 optical depth is obtained from the Lya 
one, using the following relation: 

(4 = 0.16 r Lya (s) + tl ya(zp), (26) 

which assumes that T-L, y p(z) = 0.16TL ya ( 2 )- This assumption is 
true for low column density systems when the line profile of ab¬ 


sorption is determined by the velocity field and is same for Lya 
and Ly/3. In this case, the points in the T^ y g — 7L ya plane will 
be strictly bound by a lower envelope, which will correspond to a 
straight line having a slope of 0.16. This bound is shown in the fig¬ 
ure as the slanted solid line. Note that for ERM there are truly no 
points below this line. On the other hand, for LRM there are a lot 
of points below the solid line with slope 0.16. This is related to the 
fact that the absorption from neutral regions present in the LRM 
cannot be described by a simple Gaussian profile and one has to 
take into account the effect of damping wings. This means that, as 
already discussed in Section 2.1.4, the usual adopted way to com¬ 
pute the total Ly/3 optical depth from the Lya one using equation 
(26) it is not appropriate in general, particularly when the neutral 
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Figure 10. Dark Gap Width Distribution (DGWD) at redshift 5.7-6.3 (left) and 6.0-6.6 (right). Solid (red) and dotted (blue) lines represent the ERM and the 
LRM, respectively. For each model, the thick line is the average over 300 LOS, the error bars denote cosmic variance. 



Figure 11. Distribution of the largest dark gap widths W™ ax for 300 LOS in the redshift range 5.7 - 6.3 (left panel) and 6.0 - 6.6 (right panel) for ERM (solid 
red line) and LRM (dotted blue line). The vertical error bars denote the cosmic variance; the horizontal error bars show the bin size. 


fraction of the gas is high and the Lorentzian part of the line profile 
becomes important. 

3.7 Variations in the Late Reionization Models 

We now discuss certain other possibilities regarding the distribu¬ 
tion of neutral regions along lines of sight. So far we have been 
using LRM as the fiducial model for late reionization which as¬ 


sumes that the neutral regions are distributed randomly, and they 
have no correlation with the density field. However, this is not the 
only possible way to distribute the neutral pixels. Hence we study 
two variations of the LRM which are named LRMd (d=density) 
and LRMc (c=clustered). The LRMd is similar to LRM except that 
within a given redshift range the neutral pixels are correlated with 
the density field with high density regions being preferentially neu¬ 
tral. In the case of LRMc, we assume that the neutral regions are 
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Figure 12. PWD at redshift ranges 5.7-6.3 (left panel) and 6.0-6.6 (right panel). Solid (red) and dotted (blue) lines represent the ERM and the LRM, 
respectively. For each model, the thick line is the average over 300 LOS, the error bars denote cosmic variance. 



0 12 0 12 3 

pmax(^) P^ax(A) 


Figure 13. Same as in Figure 11 but for largest peak widths 


maximally clustered (i.e., they form a large coherent structure along 
the line of sight) when calculating the one-dimensional filling fac¬ 
tor, qmi. This assumption represents the most extreme alternative 
to the LRM. In constructing our models we do not expect to recover 
exactly the real distribution of neutral regions; however, as we are 
considering the most extreme cases we expect the actual distribu¬ 
tion to be somewhere between the two. In the LRMc we find that 
the IGM is characterized by highly clustered large neutral regions 
(of lengths as large as few tens of comoving Mpc) and the correla¬ 


tion of these regions with the density field does not have any effect 
on the simulated spectra. The technical details on how we generate 
these models are discussed in Appendix B. 

We start with the qualitative description of sample spectra for 
the three models as shown in Figure 16. We recall that all the three 
panels have the same baryonic distribution and same value of qHii 
along the line of sight (i.e., the number of neutral pixels, denoted by 
crosses, in the three panels are equal although it may not be visually 
obvious). 
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Figure 14. Same as in Figure 11 but for the Ly ft region. 



Figure 15. Scatter plot of the mean Lyct and the mean Ly ft optical depths for each dark gap. Points and triangles represent dark gap such that 
0 < Iog |q [Max{ Wry . IV'.[] < 0-5 and 1.5 < 1 °gio [Ma.x{ W a . Wp }] < 2, respectively, where Ma.x{lid,. I f A, } is the width of the largest between 
the Lya and Ly/3 dark gaps. The solid lines parallel to the axes represent Lyo and Ly/3 optical depths equal to 2.5, which acts as the lower limit for defining 
gaps. The slope of the slanted solid line is equal to the ratio (/ 1 . y ,r A i , y ^ ) / (/i , y ,, A i , y f , ) = 0.16. 


The first point to note is the similarity between the LRM and 
LRMd, implying that the correlation of the neutral segments with 
the density field does not have much of an effect on the Lya forest. 
Although there are small differences in the actual positions of the 
neutral pixels in the two models, the gap widths are exactly similar 
for this line of sight. There are differences in the widths of gaps for 


LRM and LRMd along other lines of sight but the variations are 
within the statistical errors. 

On the other hand, the spectrum for the LRMc is quite dif¬ 
ferent from the spectra of LRM or LRMd. This is expected as the 
LRMc is very different from the other two in its physical properties. 
As discussed in Appendix B, LRMc consists of very large neutral 
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Figure 16. Simulated spectra for three different models of late reionization. The top panel shows a line of sight spectrum for LRM (same as in the bottom 
panel of Figure 8); the middle and bottom panels show spectra along the same line of sight (i.e., having the same density distribution) for LRMd and LRMc 
respectively. The black lines plotted immediately below the spectra show the regions identified as gaps. The positions of the neutral pixels are identified by 
crosses. 


segments (up to 100 comoving Mpc), while the LRM and LRMd 
are characterized by numerous regions of smaller sizes. Since the 
neutral regions are highly clustered in LRMc, they leave large vol¬ 
umes of ionized IGM. Consequentially we find that a high fraction 
of lines of sight (about 70 per cent at z = 5.7 — 6.3 and more than 
80 per cent at 2 = 6.0 — 6.6) do not encounter any neutral seg¬ 
ments at all. Thus statistically LRMc is expected to be the closest 
to ERM and should be quite different from LRM and LRMd. Even 
when a line of sight encounters some neutral segments, it is more 
likely that the segments are clustered at a few places forming large 
regions of neutral IGM. This can be seen in the bottom panel of 
Figure 16 where we find that all the neutral pixels are clustered at 
the highest redshift contrary to LRM and LRMd where the neutral 
pixels are distributed throughout the line of sight. This has a severe 


effect on the distribution of gaps and peaks. For example, the peak 
around Arf = 1182 A present in the ERM (see top panel of Fig¬ 
ure 8) is completely suppressed in LRM and LRMd because of the 
randomly distributed neutral regions. However, the peak is present 
in the LRMc because the neutral pixels are distributed differently. 
This implies that LRMc would have gaps of smaller widths com¬ 
pared to LRM and LRMd and thus would be closer to the ERM in 
its properties. Furthermore, it is clear from Figure 16 that a large 
gap does not necessarily correspond to a large neutral region. In 
fact we find that smaller regions of neutral hydrogen (of sizes < 1 
comoving Mpc) dispersed along the line of sight are more effective 
in suppressing the flux and thus creating large dark gaps in the ab¬ 
sorption spectra compared to the larger clustered regions. Probing 
such small regions are quite difficult with cosmological simulations 
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Figure 17. Same as in Figure 11 but considering only late reionization models. Dotted (blue), long-dashed (magenta) and long-dashed (black) lines represent 
LRM, LRMd and LRMc respectively. 


as they are close to the resolution limits, thus semi-analytical stud¬ 
ies can be of more help in such cases. 


We can carry out more quantitative comparisons between the 
different reionization models. We focus on the fraction of lines of 
sight with a given value of largest gap width for the two redshift 
ranges. Our results are plotted in Figure 17. As expected, the differ¬ 
ence between LRM and LRMd is not statistically significant. The 
fraction of larger dark gaps in both panels is slightly lower in the 
LRMd with respect to the LRM. This result can be understood not¬ 
ing that in the LRMd the neutral regions are preferentially located 
in those pixels having an higher density, where the flux is already 
more likely suppressed. On the contrary the distribution for LRMc 
is quite different and is more similar to ERM (see Figure 11). The 
point to note is that in spite of such extreme (and somewhat un¬ 
physical) clustering, the LRMc is still different from ERM. For ex¬ 
ample, 10 per cent of the lines of sight have a largest gap width of 
60 A (80 A) in the range z = 5.7 — 6.3 (z = 6.0 — 6.6) for LRMc 
which are not present in the ERM. Thus even in the most extreme 
distribution of neutral regions, the ionization state of the IGM can 
be determined using the dark gap statistics. 


Moreover, it is also interesting to note that the distribution of 
the largest gap is quite different for LRM and LRMc - thus it might 
be possible to obtain some information on the clustering of neutral 
regions. For example, in a quite realistic situation where we are pro¬ 
vided with only, say, 10 QSO spectra with emission redshift above 
6 , we expect to find in the LRMc one LOS having a dark gap as 
large as 50 A (which would rule out ERM) and, in the same sam¬ 
ple, at least 5 LOS whose largest dark gap does not exceed 30 A 
(which would rule out LRM). In this case, we could conclude at 
the same time that the universe is in the pre-overlap stage and that 
the Ell regions are highly clustered. 


3.8 Variations in the resolution and signal to noise ratio 

We have carried out extensive checks on our predictions by vary¬ 
ing different observational and instrumental artifacts. In particular, 
we have varied the resolution and noise in the simulated spectra to 
verify if any of our conclusions change. 

Our results presented in the previous sections are based 
on a resolution of 5300, which corresponds to a FWHM of 
~ 60 km s -1 . We have checked our results for up to a resolu¬ 
tion as high as 40000 (corresponding to a FWHM of ~ 8 km s -1 ), 
which is similar to what is expected in very high quality spectra. 
The results, particularly the gap and peak width statistics for the 
Lya forest do show some variations when the resolution is high. 
However, we find that none of our conclusions get modified in a 
significant way. 

For the noise, we have been using a Gaussian random vari¬ 
ate having variance (j no i Se = 0.02. Decreasing the value of a no ise 
(which corresponds to higher signal-to-noise ratio) has no effect on 
the gap and peak statistics. However, if we increase the value of 
fTnoise such that it becomes close to the flux threshold (~ 0.08) 
used to define the dark gaps, we find that the occurrence of gaps 
(and peaks) changes drastically; there are various spurious spikes 
which arise because of high noise. Thus, to study dark gaps in ab¬ 
sorption spectra it is better to have a good signal-to-noise ratio; in 
case the signal-to-noise ratio is poor, it is necessary to use an higher 
flux threshold for defining the gaps. 


4 CONCLUSIONS 

In this work we have applied various statistical diagnostics to the 
transmitted flux of the Lya (and Ly/3) forest, with the aim of con¬ 
straining cosmic reionization history. Two different reionization 
scenarios, based on self-consistent models of Choudhury & Fer¬ 
rara (2005), have been considered: (i) an Early Reionization Model 
(ERM) characterized by a highly ionized IGM at z < 14, and (ii) 
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a Late Reionization Model (LRM) in which reionization occurs at 
z Rs 6. These reionization histories are the result of different as¬ 
sumptions about the type of ionizing sources considered. In both 
models, at redshifts z < 6 contributions to the UV background 
come from PopII stars and QSOs. The main difference between 
ERM and LRM is constituted by the presence of PopIII stars (not 
included in the LRM) which reionize the IGM at high redshift in 
ERM. We note that for the ERM the electron scattering optical 
depth is r e = 0.17. in agreement with the WMAP value, whereas 
for the LRM r e = 0.06, a value more than 2a away from the mean. 
The main aim of this work is to quantitatively compare the predic¬ 
tions from these two models, taken as representative of a wider 
class of early or late reionization scenarios, with the highest quality 
observational data. 

First, we have extensively tested our results against available 
data at z < 6 and found that ERM and LRM are equally good at 
explaining the observational results. In particular, they reproduce 
very well the observed Gunn-Peterson optical depth evolution, the 
Probability Distribution Function of the transmitted flux, and the 
Dark Gap Width Distribution. This comparison allows us to draw 
a few conclusions: (i) the Lya forest observations at z < 6 are 
unable to discriminate early vs. late reionization scenarios; (ii) the 
same data cannot exclude that reionization took place as early as 
by z « 14. 

In order to make progress higher redshift quasar spectra are 
necessary, which are likely to become soon available as SDSS 
is expected to find ~ 20 luminous quasars in the redshift range 
6 < z < 6 . 6 . By extending our model predictions to higher red- 
shifts we find that: (i) The mean and the PDF of the transmitted flux 
are essentially useless to constrain the ionization state at z > 6 as 
most of the pixels are consistent with zero transmission (indepen¬ 
dent of the ionization state), i.e. in practice these statistics probe 
the noise distribution; (ii) the dark gap width distribution (DGWD) 
is very sensitive to the reionization history. We expect at least 30 
per cent of the lines of sight (accounting for statistical errors) in 
the range z = 5.7 — 6.3 to have dark gaps of widths > 50 A 
(in the QSO rest frame) if the IGM is in the pre-overlap stage at 
z > 6 , while no lines of sight should have such large gaps if the 
IGM is already ionized. The constraints become more stringent at 
higher redshifts. We find that in order to discriminate between early 
and late reionization scenarios 10 QSOs should be sufficient for the 
DGWD to give statistically robust results, (iii) The statistics of the 
peaks in the spectra represents an useful complement to the dark 
gaps and can put additional constraints on the ionization state. As 
for the DGWD, we find that this statistics constrains reionization 
models more efficiently at high redshifts. In particular, if the uni¬ 
verse is highly ionized at z ~ 6 , we expect to find peaks of width 
~ lA in 40 per cent of the lines of sight, in the redshift range 
6.0 — 6 . 6 ; on the contrary, the LRM predicts no peaks larger than 

0.8 A. 

As an independent check of the models, we have extended all 
the above statistics to Ly (5 regions. It turn out that this diagnostics is 
less powerful than the analog Lya one to probe the ionization state 
of the IGM. Moreover, since the Ly/3 cross section is 5.27 times 
smaller than Lya one, the flux is always higher in the Ly/3 region 
than in the Ly a forest. This implies that to obtain Ly/3 constraints as 
stringent as those from Lycr, requires the analysis of QSOs spectra 
for z > 6 . 6 . 

We would like to comment on some additional issues concern¬ 
ing LRM. As discussed in the text, the hydrogen distribution in the 
LRM for low density IGM is characterized by two distinct phases 
at z > 6 , namely an ionized and a neutral phase. To model this two- 


phase IGM we have studied different topologies of neutral regions. 
Interestingly, the main conclusions of our work remain unchanged 
(see for instance Figure 17) irrespective of whether we assume that 
the positions of the neutral regions are completely random (LRM) 
or we correlate the HI regions along different lines of sight with 
the density field (LRMd). This result is basically due to the damp¬ 
ing wings of neutral regions, which are able to suppress the flux 
in regions of the spectra that are fully ionized (See Figure 9). On 
the other hand if the suppression of the flux does not necessarily 
correspond to the presence of neutral regions, it implies that QSO 
spectra might not be very useful to study in details the topology of 
the neutral hydrogen. 

However it is still possible to get some idea about the cluster¬ 
ing of the neutral regions provided we know the evolution of the vol¬ 
ume filling factor of ionized regions reasonably well. We have stud¬ 
ied an alternative distribution of the neutral regions, called LRMc, 
where we assume that neutral regions form the largest possible co¬ 
herent structure along the line of sight (sometimes as large as 100 
Mpc comoving which corresponds to almost 1/3 of the box). Be¬ 
cause of such high clustering, large volumes of IGM are left ion¬ 
ized, resulting in a large fraction of lines of sight which do not 
encounter any neutral region at all. Consequently, the distribution 
of the largest dark gap widths is biased towards lower widths com¬ 
pared to LRM. This means that the statistics of the largest dark gaps 
could also give an idea of the clustering in the HI regions. More¬ 
over, as is well known, the 21 cm signal front neutral hydrogen is 
sensitive to distribution of the HII regions (Furlanetto, Hernquist, 
& Zaldarriaga 2004; Furlanetto, Zaldarriaga, & Hernquist 2004). 
Hence 21 cm maps could be promising to study the correlation be¬ 
tween neutral regions and to obtain a more detailed and quantitative 
analysis of the size of neutral regions. 

Comparing the LRM and LRMc, we also find that a large gap 
does not necessarily correspond to a large neutral region. In fact 
smaller regions of neutral hydrogen (of sizes < 1 comoving Mpc) 
dispersed along the line of sight are more effective in suppressing 
the flux (because of damping wings) and thus creating large dark 
gaps in the absorption spectra compared to the larger clustered re¬ 
gions. Probing such small regions is quite difficult with cosmologi¬ 
cal simulations as they are close to the resolution limits, thus semi- 
analytical studies can be more helpful in such cases. Our method, in 
fact, does not suffer of spurious resolution effects. At high redshift, 
the length of dark gaps can be > 60 Mpc and hence the analysis 
requires a large sample of very long lines of sight. In order to cre¬ 
ate realizations of such long lines of sight, numerical simulations 
typically sample different regions of the box more than once (the 
so-called “oversampling” effect; Paschos & Norman 2005) or com¬ 
bine various spectra of smaller sizes end-to-end (F02). It is difficult 
to obtain the distribution of very large gaps (which are much larger 
than the box sizes) from such procedures as multiple ray passages 
through the same box could produce spectacular spurious artifacts 
in the gap statistics. For example, we find a much better match with 
the observations of dark gap width distribution when compared to 
the simulations of Paschos & Norman (2005), who have used a box 
of size 6.8 /i~ x Mpc. 

However, our method suffers from some limitations which are 
worth noting. First, we are not able to tackle the non-linearities in 
any self-consistent formalism - instead we assume a density distri¬ 
bution for the baryons (lognormal, in this case). Since the Lya and 
Ly/3 forests in the QSO absorption spectra arise from mostly quasi- 
linear regime, the approximation should be reasonable for comput¬ 
ing the transmitted flux. Second, it is nearly impossible to include 
full radiative transfer effects in our computation of the distribution 
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of the neutral regions and also we are not able to take into account 
the clustering of sources which is crucial to understand the proper¬ 
ties of ionized bubbles. However it is most likely that the location 
of ionizing sources might not be significantly correlated with neu¬ 
tral regions, particularly when one is dealing with high values of 
filling factor, as in our case (see, for example, the maps in Figure 1 
of Ciardi, Ferrara, & White 2003). Anyway it would be interesting 
to combine our approach in the distribution of neutral regions with 
radiative transfer simulations for a more detailed analysis of the 
absorption spectra, particularly in the vicinity of the QSO (which 
corresponds to a highly non-linear structure which is beyond the 
validity of the lognormal approximation). 

In the future, it would be most interesting to check our pre¬ 
dictions against a large sample of high signal-to-noise QSO data at 
redshift > 6. Our results show that in that case the dark gap statis¬ 
tics would provide a robust and independent probe of the reioniza¬ 
tion history. 
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APPENDIX A: LOGNORMAL APPROXIMATION VS. 
SIMULATIONS 

In this Appendix we compare the various Lya flux statistics, 
namely, the PDF and DGWD, computed using the lognormal model 
with those obtained from numerical simulations. Of course, a thor¬ 
ough verification of the lognormal approximation would require a 
comparison with full hydrodynamical simulations; however, since 
it has been found that the HydroPM simulations are able to repro¬ 
duce most of the physical properties of the Lya forest and are com¬ 
putationally much less expensive, we shall restrict ourselves to Hy¬ 
droPM simulations for the purpose of this work. We should men¬ 
tion here that the comparison presented here is mainly to justify 
the lognormal approximation for doing statistics with Lya forest at 
2 « 6 - this is not intended to be a rigorous justification for the 
lognormal approximation for the baryonic density field. 

While comparing our model with HydroPM simulations, one 
should keep in mind that the two models are not at all similar in 
their treatment of the reionization histories. While our models treat 
the reionization as an extended and gradual process with different 
ionized regions overlapping gradually, the simulations consider an 
abrupt reionization. Since sorting out such issues require much de¬ 
tailed effort, we restrict the comparisons in the post-reionization 
epoch (i.e., 2 < 6) and choose the parameters (cosmological 
model. To, 7 , r hi) in our code in a manner that they have the same 
values as HydroPM simulations at 2 < 6. This allows us to have a 
fair comparison between the lognormal model and numerical simu¬ 
lations with uncertainties due to different reionization histories un¬ 
der control. Note that, for the comparison in this Appendix, we have 
not used the reionization models (LRM or ERM) described in the 
paper - in fact, we have used the values from the HydroPM code. 

We have run HydroPM simulations with 128 3 particles in a 
12.8/i _1 Mpc box having a mesh size of 100/t -1 kpc. At various 
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Figure Al. Comparison of the PDF of transmitted flux obtained from lognormal model, shown as solid lines, with that obtained from HydroPM simulations, 
shown as points with error bars. The vertical error bars represent dispersion along different lines of sight, while the horizontal error bars denote the bin size. 
We show the results for three redshifts z = 4.5 (left panel), 2 : = 5.0 (middle panel) and z = 5.5 (right panel). 
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Figure A2. Same as in Figure Al but for DGWD. 


redshift outputs, random LOS were chosen along different direc¬ 
tions and the corresponding density and velocity fields were calcu¬ 
lated. It is then straightforward to obtain the transmitted flux along 
the LOS given the values of different parameters related to the 
IGM. In the case of the lognormal model, we use the same proce¬ 
dure outlined in Section 2. We use a box size and resolution similar 
to what is used in the numerical simulations so that the resolution 
effects are not substantial. We have not added any observational ar¬ 
tifacts (smoothing, noise etc) so that the comparison is restricted to 
physical properties of the IGM. We compute the PDF and DGWD 
in a manner similar to what is described in the main text. 

The results for PDF and DGWD are shown in Figures Al 
and A2 respectively for three redshifts 4.5, 5.0 and 5.5. At red- 
shifts lower than 4.5, the frequency of gaps reduces considerably 
and hence, in some sense, the usefulness of the gap statistics be¬ 
comes irrelevant. On the other hand, at higher redshifts, the size of 
the gaps become of the order or larger than the box size of the Hy¬ 
droPM simulations, and hence one needs to use considerably larger 


box sizes to carry out the comparison. Furthermore, at redshifts 
closer to 6, the physical properties of the IGM might be affected by 
details of the reionization history (particularly if the reionization is 
late), and hence we restrict ourselves to z < 5.5. 

As can be seen from the Figure, the agreement between log¬ 
normal approximation and HydroPM simulations is excellent at 
a = 5.5, both for the PDF and the DGWD. The agreement is 
slightly less for z = 5.0 and is acceptable (within la) at 2 = 4.5. 
It is thus clear that the lognormal model can be used reliably for 
Lya transmitted flux statistics, particularly at high redshifts. 


APPENDIX B: VOLUME FILLING FACTOR OF IONIZED 
REGIONS 

In Sections 2.1 and 2.2 physical properties of different reionization 
models have been discussed. In particular it results that in the Late 
Reionization Model (LRM) the IGM is characterized by two dis- 
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Figure A3. Cumulative distribution of the lengths of the neutral regions in the redshift ranges 5.7-6.3 (left panel) and 6.0-6.6 (right panel) for three different 
reionization models as indicated in the figure. 


tinct phases at z > 6, namely an ionized and a neutral phase. The 
aim of this Appendix is to explain how we distribute the fully neu¬ 
tral regions along different lines of sight in the LRM, taking into ac¬ 
count the scatter and evolution in the volume filling factor Qmi{z) 
of ionized regions. The method consists of two parts which are de¬ 
scribed in the two following subsections: 


B1 Calculation of the one dimensional filling factor qmi{z) 

This involves the geometrical translation of the three-dimensional 
filling factor Qmi(z) to a distribution of the one dimensional filling 
factor ()hiiP) along different lines of sight. This calculation can be 
performed once Qmi{z) and the geometry of the neutral regions 
are known. 

We start the procedure with a three-dimensional box with two 
spatial directions (representing two directions on the sky) and one 
direction along the redshift axis (representing the direction along 
the line of sight). The spatial extent of the box can be arbitrary as 
we are only doing a geometrical exercise. We divide the box into 
(thin) redshift slices and within each slice distribute a number of 
neutral regions according to the value of Qhii(z) at that redshift. 
Note that this procedure automatically takes into account the evo¬ 
lution in the volume filling factor. However, the value of Qmi(z) 
at a redshift does not contain information of the typical sizes and 
shapes of the neutral regions. We thus consider the two most ex¬ 
treme cases, one in which the neutral regions are distributed in a 
completely random manner with no clustering, while in the other 
case we put maximally correlated neutral regions which are of the 
largest size allowed by the value of Qmi(z) at that given redshift. 
In the first case, the box is characterized by numerous neutral re¬ 
gions of very small sizes, while in the second one, the box con¬ 
sists of bigger neutral blobs representing the maximum clustering 
of neutral regions. While in reality, none of these cases may rep¬ 
resent the geometry of the neutral regions, they nevertheless repre¬ 


sent the two extremes and thus we are sure that the actual case is 
somewhere between these. 

Once we have distributed the neutral regions within the three- 
dimensional box, we shoot numerous lines of sight through it. Each 
line of sight intersects different neutral regions at different red- 
shifts, and thus we can calculate the one-dimensional filling factor 
qmi{z) along each of them. Thus given a single Qmi(z), we build 
up a distribution of qmi(z) characterizing each line of sight. Using 
this distribution, we are not only able to take into account the evolu¬ 
tion in Qmi{z), but also the intrinsic scatter (or, cosmic variance) 
in the distribution of neutral regions. 

B2 Correlation between neutral regions and the density field 

Now that we have found out (at least) two ways of calculating 
qmi{z) along a given line of sight, we have to accordingly dis¬ 
tribute the neutral pixels along the same. For this, it is essential that 
we know whether the neutral regions have any correlation with the 
density field. The Lya forest arises mostly from baryonic overden¬ 
sities of a few (< 5) and it is not clear whether the neutral regions 
have any correlation with densities within such ranges (the corre¬ 
lation is much more established in case of higher densities). Hence 
we have tried both the options; in the first case, we have distributed 
the neutral pixels along the line of sight without any consideration 
for the density field, while in the second case we distribute the neu¬ 
tral pixels such that high density regions are preferentially neutral 
at the same time preserving the evolution trend of gmi(~). 

B3 Different LRMs 

Since we have two ways of calculating the one-dimensional filling 
factor and furthermore have the freedom in choosing whether the 
neutral regions are correlated with the density field, we can devise 
various LRMs which will cover all the extreme possibilities of dis¬ 
tributing the neutral regions. 
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• LRM: While computing the distribution of the one¬ 
dimensional filling factor, we assume that the neutral regions are 
distributed randomly, and while distributing the neutral pixels along 
lines of sight, we assume that they have no correlation with the den¬ 
sity field. This acts as the fiducial model for our paper. 

• LRMd: This is similar to LRM except that within a given 
redshift slice the neutral pixels are correlated with the density field 
with high density regions being preferentially neutral. 

• LRMc: In this model we want to reach the maximum level of 
clustering for neutral regions. To do this we assume that neutral re¬ 
gions are coherent structures (resembling filaments extended along 
the lines of sight) which preserve the evolution of the volume filling 
factor. Once we assume this, we find that the IGM is characterized 
by highly clustered large neutral regions (of lengths as large as few 
tens of comoving Mpcs) and the correlation of these regions with 
the density field does not have any effect on the simulated spectra. 

To understand the physical properties of the different LRMs, it 
is useful to compute the distribution of sizes of the neutral regions 
along all the LOS. The number of regions N(> L) along different 
lines of sight having a length greater than L for two redshift in¬ 
tervals of interest is plotted in Figure A3. We have normalized the 
distribution such that the total length of the line of sight is 1/t” 1 
Gpc, so as to compare our results to those obtained from simula¬ 
tions (Nusser et al. 2002). 

The first obvious fact to note by comparing the two panels of 
the Figure is that, for a given model, the number of neutral regions 
with comparatively larger sizes increases as one goes to higher 
redshifts. Furthermore, it is also clear that the LRMc consists of 
very large regions (~ 10-100 h comoving Mpc) as it represents 
the model with maximum clustering of neutral regions. Among the 
other two models (LRM and LRMd), the LRMd has regions of rel¬ 
atively larger lengths; the reason is because of the correlation with 
density field. The LRM contains no clustering of the neutral re¬ 
gions and thus, as expected, is characterized by numerous regions 
of small sizes (< few comoving Mpcs). 

As an independent check on our procedure, Figure A3 can be 
compared with Fig. 1 of Nusser et al. (2002) in a qualitative man¬ 
ner. Nusser et al. (2002) use N-body simulations coupled to a semi- 
analytical galaxy formation model and various models of propaga¬ 
tion of ionization fronts to calculate the sizes of neutral segments 
along lines of sight. Since our method for calculating the reioniza¬ 
tion history is quite different from theirs, it is not possible to carry 
out a more quantitative comparison. Even though our modelling of 
neutral regions predicts a larger number of small regions than sim¬ 
ulations, we have to keep in mind that our resolution in computing 
the QSO spectra is much higher than simulations (in our case it is 
~ O.OT/W 1 Mpc while in the simulations of Nusser et al. 2002 it 
is ~ 0.55/t -1 Mpc). As also claimed by Nusser et al. (2002), the 
lengths of neutral regions depend on the resolution adopted in the 
computation, i.e., increasing the resolution increases the number of 
small regions. As a check, when we compute the distribution of 
neutral regions using a resolution close to Nusser et al. (2002) one, 
we find that our results are in good agreement with simulations in 
the LRM/LRMd cases, thus supporting the reliability of our ap¬ 
proximate method of distributing neutral regions, while LRMc re¬ 
sults to be quite distant from the simulation results as it produces 
segments as large as 100 /i _1 comoving Mpcs which are never seen 
in simulations; nevertheless we study it as an extreme case. 

We end this Appendix pointing to the fact that the numeri¬ 
cal simulations cannot probe lengths much smaller than 1 /i -1 co¬ 
moving Mpc because of resolution effects; however with our semi- 


analytical models, it is possible to probe neutral regions with sizes 
as small as 0.04 /i -1 comoving Mpc. Interestingly, it turns out that 
these small regions, distributed randomly, are as effective as larger 
regions in suppressing the flux of the quasar spectra. This issue has 
been discussed in great detail in Section 3.7. 



